Saddle Hills County
Large Language Models and 3D Vision for Intelligent Robotic Perception and Autonomy
Mehta, Vinit, Sharma, Charu, Thiyagarajan, Karthick
With the rapid advancement of artificial intelligence and robotics, the integration of Large Language Models (LLMs) with 3D vision is emerging as a transformative approach to enhancing robotic sensing technologies. This convergence enables machines to perceive, reason and interact with complex environments through natural language and spatial understanding, bridging the gap between linguistic intelligence and spatial perception. This review provides a comprehensive analysis of state-of-the-art methodologies, applications and challenges at the intersection of LLMs and 3D vision, with a focus on next-generation robotic sensing technologies. We first introduce the foundational principles of LLMs and 3D data representations, followed by an in-depth examination of 3D sensing technologies critical for robotics. The review then explores key advancements in scene understanding, text-to-3D generation, object grounding and embodied agents, highlighting cutting-edge techniques such as zero-shot 3D segmentation, dynamic scene synthesis and language-guided manipulation. Furthermore, we discuss multimodal LLMs that integrate 3D data with touch, auditory and thermal inputs, enhancing environmental comprehension and robotic decision-making. To support future research, we catalog benchmark datasets and evaluation metrics tailored for 3D-language and vision tasks. Finally, we identify key challenges and future research directions, including adaptive model architectures, enhanced cross-modal alignment and real-time processing capabilities, which pave the way for more intelligent, context-aware and autonomous robotic sensing systems.
- Europe > Switzerland (0.04)
- Asia > India > Telangana > Hyderabad (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- (9 more...)
- Research Report > Promising Solution (1.00)
- Overview (1.00)
- Transportation > Ground > Road (0.93)
- Information Technology (0.93)
- Health & Medicine (0.92)
Computer Vision based group activity detection and action spotting
Sivalingam, Narthana, Sivasthigan, Santhirarajah, Mahendranathan, Thamayanthi, Godaliyadda, G. M. R. I., Ekanayake, M. P. B., Herath, H. M. V. R.
Group activity detection in multi-person scenes is challenging due to complex human interactions, occlusions, and variations in appearance over time. This work presents a computer vision based framework for group activity recognition and action spotting using a combination of deep learning models and graph based relational reasoning. The system first applies Mask R-CNN to obtain accurate actor localization through bounding boxes and instance masks. Multiple backbone networks, including Inception V3, MobileNet, and VGG16, are used to extract feature maps, and RoIAlign is applied to preserve spatial alignment when generating actor specific features. The mask information is then fused with the feature maps to obtain refined masked feature representations for each actor. To model interactions between individuals, we construct Actor Relation Graphs that encode appearance similarity and positional relations using methods such as normalized cross correlation, sum of absolute differences, and dot product. Graph Convolutional Networks operate on these graphs to reason about relationships and predict both individual actions and group level activities. Experiments on the Collective Activity dataset demonstrate that the combination of mask based feature refinement, robust similarity search, and graph neural network reasoning leads to improved recognition performance across both crowded and non crowded scenarios. This approach highlights the potential of integrating segmentation, feature extraction, and relational graph reasoning for complex video understanding tasks.
- Asia > Sri Lanka (0.04)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
- Health & Medicine (0.46)
- Information Technology (0.46)
GRAM-R$^2$: Self-Training Generative Foundation Reward Models for Reward Reasoning
Wang, Chenglong, Mu, Yongyu, Zhou, Hang, Huo, Yifu, Zhu, Ziming, Zeng, Jiali, Yang, Murun, Li, Bei, Hao, Xiaoyang, Zhang, Chunliang, Meng, Fandong, Zhu, Jingbo, Xiao, Tong
Significant progress in reward modeling over recent years has been driven by a paradigm shift from task-specific designs towards generalist reward models. Despite this trend, developing effective reward models remains a fundamental challenge: the heavy reliance on large-scale labeled preference data. Pre-training on abundant unlabeled data offers a promising direction, but existing approaches fall short of instilling explicit reasoning into reward models. To bridge this gap, we propose a self-training approach that leverages unlabeled data to elicit reward reasoning in reward models. Based on this approach, we develop GRAM-R$^2$, a generative reward model trained to produce not only preference labels but also accompanying reward rationales. GRAM-R$^2$ can serve as a foundation model for reward reasoning and can be applied to a wide range of tasks with minimal or no additional fine-tuning. It can support downstream applications such as response ranking and task-specific reward tuning. Experiments on response ranking, task adaptation, and reinforcement learning from human feedback demonstrate that GRAM-R$^2$ consistently delivers strong performance, outperforming several strong discriminative and generative baselines.
- Europe > Austria > Vienna (0.14)
- North America > United States > Washington > King County > Seattle (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (9 more...)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > Los Angeles County > Long Beach (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- (21 more...)
- Education (0.68)
- Leisure & Entertainment > Games > Computer Games (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Europe > United Kingdom > England > Greater London > London (0.04)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
Supplementary Material for LayoutGPT: Compositional Visual Planning and Generation with Large Language Models Anonymous Author(s) Affiliation Address email A Implementation Details 1
Table 1: The prepending instructions provided to GPT -3.5/4 during our LayoutGPT's 2D and 3D T ask Instruction for GPT -3.5/4 2D Layout Planning Instruction: Given a sentence prompt that will be used to generate an image, plan the layout of the image. Formally, each line should be like "object {width:?px; height:?px; left:?px; top:?px; }". Formally, each line should follow the template: FURNITURE {length:?px:
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
However, such inputs impose a substantial burden on users when compared to simple text inputs. To address the issue, we study how Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions, and thus collaborate with visual generative models. We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language to enhance the visual planning skills of LLMs.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > California > Santa Cruz County > Santa Cruz (0.04)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
- (2 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Europe > Netherlands > South Holland > Leiden (0.04)
- (21 more...)
Beyond Static Responses: Multi-Agent LLM Systems as a New Paradigm for Social Science Research
Haase, Jennifer, Pokutta, Sebastian
As large language models (LLMs) transition from static tools to fully agentic systems, their potential for transforming social science research has become increasingly evident. This paper introduces a structured framework for understanding the diverse applications of LLM-based agents, ranging from simple data processors to complex, multi-agent systems capable of simulating emergent social dynamics. By mapping this developmental continuum across six levels, the paper clarifies the technical and methodological boundaries between different agentic architectures, providing a comprehensive overview of current capabilities and future potential. It highlights how lower-tier systems streamline conventional tasks like text classification and data annotation, while higher-tier systems enable novel forms of inquiry, including the study of group dynamics, norm formation, and large-scale social processes. However, these advancements also introduce significant challenges, including issues of reproducibility, ethical oversight, and the risk of emergent biases. The paper critically examines these concerns, emphasizing the need for robust validation protocols, interdisciplinary collaboration, and standardized evaluation metrics. It argues that while LLM-based agents hold transformative potential for the social sciences, realizing this promise will require careful, context-sensitive deployment and ongoing methodological refinement. The paper concludes with a call for future research that balances technical innovation with ethical responsibility, encouraging the development of agentic systems that not only replicate but also extend the frontiers of social science, offering new insights into the complexities of human behavior.
- Europe > Germany > Berlin (0.40)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- (4 more...)
- Research Report > Experimental Study (1.00)
- Overview (1.00)
- Research Report > New Finding (0.92)
- Government (1.00)
- Education (1.00)
- Leisure & Entertainment > Games > Computer Games (0.92)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Modern Deep Learning Approaches for Cricket Shot Classification: A Comprehensive Baseline Study
Cricket shot classification from video sequences remains a challenging problem in sports video analysis, requiring effective modeling of both spatial and temporal features. This paper presents the first comprehensive baseline study comparing seven different deep learning approaches across four distinct research paradigms for cricket shot classification. We implement and systematically evaluate traditional CNN-LSTM architectures, attention-based models, vision transformers, transfer learning approaches, and modern EfficientNet-GRU combinations on a unified benchmark. A critical finding of our study is the significant performance gap between claims in academic literature and practical implementation results. While previous papers reported accuracies of 96\% (Balaji LRCN), 99.2\% (IJERCSE), and 93\% (Sensors), our standardized re-implementations achieve 46.0\%, 55.6\%, and 57.7\% respectively. Our modern SOTA approach, combining EfficientNet-B0 with a GRU-based temporal model, achieves 92.25\% accuracy, demonstrating that substantial improvements are possible with modern architectures and systematic optimization. All implementations follow modern MLOps practices with PyTorch Lightning, providing a reproducible research platform that exposes the critical importance of standardized evaluation protocols in sports video analysis research.
- North America > Canada > Alberta > Census Division No. 19 > Saddle Hills County (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)